Genomic data on hematologic malignancies from low- and middle-income countries (LMICs) remain sparse. India, with its diverse ancestry, environmental exposures, and regional variability, offers a unique setting to explore how geography influences blood cancer biology. We present the first large-scale genomic epidemiology of hematologic malignancies in India, comprising 1000 consecutive patients undergoing DNA-based next-generation sequencing (NGS) across a 75-gene myeloid/lymphoid panel. Through unsupervised clustering, hotspot enrichment, and COSMIC signature profiling, we identify reproducible molecular subtypes, region-specific mutations, and environmental mutational processes.

Among 1000 patients (median age 42; 62% male), 72.2% harbored ≥1 pathogenic mutation. Median age at diagnosis was notably younger than global cohorts (AML: 46; MDS: 69). K-means clustering (k=4, silhouette 0.74) revealed four molecular subtypes: Cluster 0 (n=362) was AML-predominant, enriched for NPM1, FLT3-ITD, DNMT3A, NRAS, with CBFB::MYH11 and RUNX1::RUNX1T1 fusions enriched in East/North India. Cluster 1 (n=146) comprised CML/kinase-driven neoplasms (BCR::ABL1, ABL1, KMT2D), overrepresented in North and South India, including a BCR::ABL1 hotspot in the North (OR 3.87, p=0.001). Cluster 2 (n=342) spanned MDS/MPN, enriched for ASXL1, TET2, SF3B1, U2AF1, JAK2, with higher prevalence of PML::RARA, PTPN11, and SRSF2 in West and South India. Cluster 3 (n=150) included cytopenias and lymphoid neoplasms, mutationally quiet, with TET2 underrepresented in East India (OR 0.29, p=0.041), suggesting distinct biology.

Among 362 AML patients, the cohort revealed key divergences from Western patterns. The median age was 46 years—nearly two decades younger than SEER-reported data. NPM1 (34.2%) was most frequent, followed by FLT3-ITD (27.3%), DNMT3A (22.1%), NRAS (14.6%), and IDH2 (11.6%). NPM1+/FLT3-ITD co-mutations were significantly higher than in TCGA/BeatAML (17.1% vs. 10.3%, OR 1.79, p=0.018), highlighting a younger, mutation-driven AML biology. CBFB::MYH11 and RUNX1::RUNX1T1 fusions clustered regionally in East/North India, often co-occurring with KIT (10.4%), suggesting a localized core-binding factor AML subtype. Notably, FLT3-ITD was detected in 41% of NPM1-mutated AML, underlining the clinical urgency of early genotyping in these patients. Mutational signature profiling revealed strong enrichment of SBS17a and SBS12—linked to oxidative stress and air pollution—especially in intermediate-risk AML from urban industrial zones. 9.3% of AML patients harbored SBS26 or SBS21, associated with mismatch repair (MMR) deficiency, despite no overt Lynch-like phenotype. These cryptic MMR signatures suggest an unrecognized germline or epigenetic repair defect in younger AML patients in LMICs.

COSMIC signature analysis across the full cohort showed high overall concordance with the OHSU leukemia dataset (94.2% z-scores within ±2), yet five signatures were significantly enriched in the Indian cohort: SBS17a (28.3% vs. 19.2%, z=2.90) and SBS17b (11.9% vs. 8.6%, z=2.38), linked to oxidative stress and alkylating exposure; SBS12 (30.7% vs. 21.9%, z=2.49), associated with hydrocarbon pollution; and SBS26 (31.3% vs. 22.4%, z=2.48) and SBS21 (33.1% vs. 24.4%, z=2.20), implicating MMR deficiency. SBS26 and SBS21 were particularly enriched in Cluster 2 (MDS/MPN), raising the possibility of underrecognized repair defects in this subgroup. SBS12 enrichment in urban centers implicates industrial air quality as a mutagenic contributor to hematologic oncogenesis.

This is the largest hematologic genomic dataset from any LMIC and the first to map mutational signatures, environmental imprints, and region-specific clusters across a national population. These findings offer critical insights into young-onset AML, pollution-linked leukemogenesis, and localized genetic risk. They support the urgent need for region-adapted diagnostics, environmental surveillance, and a national precision hematology registry across India and similar resource-limited settings. Broader implementation of rapid, population-aware genomics could help decentralize diagnostic access, guide early therapy decisions, and inform public health policies targeting modifiable environmental carcinogens.

This content is only available as a PDF.
Sign in via your Institution